A radix-16 FFT algorithm suitable for multiply-add instruction based on Goedecker method

نویسنده

  • Daisuke Takahashi
چکیده

A radix-16 fast Fourier transform (FFT) algorithm suitable for multiply-add instruction is proposed. The proposed radix-16 FFT algorithm requires fewer floating-point instructions than the conventional radix-16 FFT algorithm on processors that have a multiplyadd instruction. Moreover, this algorithm has the advantage of fewer loads and stores than either the radix-2, 4 and 8 FFT algorithms or the split-radix FFT algorithm. We use Goedecker’s method to obtain an algorithm for computing radix-16 FFT with fewer floating-point instructions than the conventional radix-16 FFT algorithm. The number of floating-point instructions for the proposed radix-16 FFT algorithm is compared with those of conventional power-of-two FFT algorithms on processors with multiplyadd instruction.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Radix 2, 3, 4, and 5 Kernels for Fast Fourier Transformations on Computers with Overlapping Multiply-Add Instructions

We present a new formulation of fast Fourier transformation (FFT) kernels for radix 2, 3, 4, and 5, which have a perfect balance of multiplies and adds. These kernels give higher performance on machines that have a single multiply–add (mult–add) instruction. We demonstrate the superiority of this new kernel on IBM and SGI workstations. Key word. FFT kernels AMS subject classifications. 65-04, 4...

متن کامل

Multiply - Add Optimized Fft Kernels

Modern computer architecture provides a special instruction|the fused multiply-add (FMA) instruction|to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-3, and radix-5 FFT kernels that e ciently take advantage of this powerful instruction are presented. If a processor is provided with FMA instructions, the radix-2 FFT algorit...

متن کامل

Optimum Complexity FFT Algorithms for RISC Processors

Modern RISC processors provide a special instruction { the fused multiplyadd (FMA) instruction a b c { to perform both a multiplication and an addition operation at the same time. In this paper newly developed radix-2, radix-4, and split-radix FFT algorithms that optimally take advantage of this powerful instruction are presented. All oating-point operations of these algorithms are executed as ...

متن کامل

Split-Radix FFT Algorithms Based on Ternary Tree

Fast Fourier Transform (FFT) is widely used in signal processing applications. For a 2n-point FFT, split-radix FFT costs less mathematical operations than many state-of-the-art algorithms. Most split-radix FFT algorithms are implemented in a recursive way which brings much extra overhead of systems. In this paper, we propose an algorithm of split-radix FFT that can eliminate the system overhead...

متن کامل

Radix-4 FFT implementation using SIMD multimedia instructions

In this paper, a fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003